Statistical Thesaurus Construction for a Morphologically Rich Language

نویسندگان

  • Chaya Liebeskind
  • Ido Dagan
  • Jonathan Schler
چکیده

Corpus-based thesaurus construction for Morphologically Rich Languages (MRL) is a complex task, due to the morphological variability of MRL. In this paper we explore alternative term representations, complemented by clustering of morphological variants. We introduce a generic algorithmic scheme for thesaurus construction in MRL, and demonstrate the empirical benefit of our methodology for a Hebrew thesaurus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ThesWB: A Tool for Thesaurus Construction from HTML Documents

Electronically available documents on the Web are exploding at an ever-increasing rate. Many Web documents, however, contain rich knowledge that describes the document's content. The Web can be viewed as a body of text containing two fundamentally different types of data: the contents and the tags. A tag is in HTML (Hyper-Text Markup Language) meta-data describing the layout and linking structu...

متن کامل

Improving Translation to Morphologically Rich Languages (Améliorer la traduction des langages morphologiquement riches) [in French]

Améliorer la traduction des langages morphologiquement riches While statistical techniques for machine translation have made significant progress in the last 20 years, results for translating to morphologically rich languages are still mixed versus previous generation rule-based systems. Current research in statistical techniques for translating to morphologically rich languages varies greatly ...

متن کامل

Exploration and Study of Chinese Thesaurus Automation Construction for Digital Libraries

The paper aims to explore Chinese thesaurus automation construction based on the freely available digital library resources. The key methods and study results are presented in the paper. The study adopted the technology of natural language processing to analysis the linguistics characteristics of terms, and combined with statistical analysis to extract the terms from technical literatures. Our ...

متن کامل

Rich Morphology Generation Using Statistical Machine Translation

We present an approach for generation of morphologically rich languages using statistical machine translation. Given a sequence of lemmas and any subset of morphological features, we produce the inflected word forms. Testing on Arabic, a morphologically rich language, our models can reach 92.1% accuracy starting only with lemmas, and 98.9% accuracy if all the gold features are provided.

متن کامل

Semi-Automatic Practical Ontology Construction by Using a Thesaurus, Computational Dictionaries, and Large Corpora

This paper presents the semi-automatic construction method of a practical ontology by using various resources. In order to acquire a reasonably practical ontology in a limited time and with less manpower, we extend the Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012